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Introduction 


The oldest records of human language transitioning from symbolic commands to complex patterns 
date back 50,000 years (Klein, 2017). In contrast, the transition of machine language from the 
symbolism of 1s and Os to complex patterns dates to the 1950s (Jones, 1994). The vast gap between 
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the long development of human language and the recent progress of complexity by machines imposes 
upon us the need to be concerned with the intricacies of language acquisition. Beginning from 2000, 
technologies in deep learning and big data accelerated Artificial Intelligence development. These 
developments have made machines learn from data via language representation and perform human- 
like interactions through Natural Language Processing techniques (Tohma and Kutlu, 2020). 
However, for a machine to really understand the language and its intricacies, multiple approaches are 
required. Chomsky, a linguist and thinker, compared human and machine-produced texts, stating that 
"the correct explanations of language are complicated and cannot be learned just by marinating in big 
data" (Chomsky et al., 2023). 


Education as a scientific framework and Bloom's Taxonomy as a benchmark can be used to 
understand the distinction between Al-generated and human-produced language. Bloom proposed the 
‘taxonomy of educational objectives,’ which has been key in shaping question formulation and 
attainment of competencies in education (Bloom, 1956). Criticism prompted its revision and update 
by Anderson and Krathwohl in 2001 to fit in with changing pedagogy (Anderson and Krathwohl 
2001). Revised Bloom's Taxonomy (RBT) examines cognitive processes and organizes them into six 
categories: remember, understand, apply, analyze, evaluate, and create. However, the taxonomy's 
complexity and diversity can pose significant challenges in expressing, classifying, implementing, 
and evaluating objectives within its framework (Bumen, 2006). Verbs play a significant role in 
developing common language related to competencies and in the formulation of appropriate questions 
among the categories of the taxonomy. A study conducted by Stanny into the English verbs in this 
respect analyzed lists of the verbs' appearance frequency published in the top 30 sites in Google 
searches, ranked with relation to their appearance in RBT (Stanny, 2016). The same study also stated 
that verbs sequenced by RBT levels were still inadequate to account for learning differentiation and 
academic levels based on among other factors the flexibility of the language, the multiple 
connotations of the same verb used in different contexts. 


In the evaluation of educational and scientific consistency in artificial intelligence generation, 
some studies use the Revised Bloom's Taxonomy as a benchmark. A recent study assessed the 
scientific acceptability level for multiple-choice questions (MCQs) developed automatically by 
ChatGPT. Comparative results from this study demonstrate similar quality levels between MCQs 
generated by ChatGPT and those by subject matter experts (K1yak, 2023). Another study that sought 
to determine the possible impact of controllable text generation using Large Language Models 
(LLMs) on education stated that it could be an effective tool. However, the experiment showed 
instances where LLM repeated similar queries despite variations in the applied control parameters 
(Elkins et al., 2023). In another study focusing on the performance evaluation of large language 
models (LLMs), a new visualization technique called LLMMaps was proposed to assess the 
knowledge abilities of LLMs in different subfields in more detail. LLMMaps underwent evaluation 
through interviews and user feedback conducted in two separate studies involving medical 
researchers and language model developers. In the SciQ dataset, the performance of each model was 
assessed based on 997 questions, with ChatGPT achieving a performance of 92.68% (Puchert et al., 
2023). 


Studies have been conducted to develop various automated classifiers for cognitive levels of 
RBT in education using machine learning techniques. In one of the rule-based studies, the questions 
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have been categorized with respect to the unrevised form of Bloom's taxonomy; preprocessing steps 
applied using techniques from natural language processing. This paper presents an approach that deals 
with overlapping verb keywords of Bloom's taxonomy, together with category weighting, and 
concludes that further rule development and testing are needed to enhance the system's effectiveness 
(Omar et al. 2012). In a study, using K-Nearest Neighbors, Logistic Regression, and Support Vector 
Machines, inquiries were classified by the cognitive domain of Bloom's taxonomy. The study used 
vectorization techniques such as TFPOS-IDF and word2vec for feature engineering. The W2V- 
TFPOSIDF and SVM techniques achieved an F1 score of 0.83 in one dataset and an F1 score of 0.89 
in another dataset (Mohammed and Omar, 2020). Assessing the usability of the Naive Bayes classifier 
as a model for predicting exam questions according to the levels of Bloom's Cognitive Domain 
Taxonomy, researchers evaluated the efficacy of different vectorization methods using a dataset with 
10-fold Cross- Validation. This dataset comprised 300 midterm and final exam questions, with feature 
extraction conducted using the TF-IDF method for classification. According to their evaluation, N- 
Gram TF-IDF achieved a Precision of 85%, and Words TF-IDF achieved a Recall of 82% (Aninditya 
et al., 2019). One of the works used pre-trained word embeddings, Long Short-Term Memory 
(LTSM), and a keyword/verb dictionary to represent the six levels of Bloom's taxonomy. The 
evaluation metrics revealed Fl scores of 74% for CLO classification and 87% for question 
classification (Shaikh et al., 2021). Utilizing Convolutional Neural Network (CNN) and LSTM for 
the classification of 844 questions about a Software Engineering course based on Bloom's Taxonomy, 
experiments revealed that the proposed CNN model achieved an accuracy rate of 80% in predicting 
the cognitive dimensions of Bloom's Taxonomy. Although the LSTM model was more successful 
during the training phase in predicting knowledge dimensions of Bloom's Taxonomy, during the 
testing phase, the CNN model demonstrated a higher accuracy rate (Laddha et al., 2021). 


The objective of this study is to evaluate the question generation capability of three prominent 
chatbots, namely ChatGPT, Gemini, and Copilot, concerning their performance in classifying 
generations alignment with cognitive domain of Revised Bloom's Taxonomy. 


Material and Methods 
Artificial Intelligence Chatbots 


Recently, AI chatbots have achieved accuracy in emulating complex language uses, opening 
numerous possibilities for applications across various fields, including education. AI chatbots such 
as ChatGPT, Gemini, and Microsoft Copilot exhibit significant potential in applications. However, 
the selection of appropriate AI chatbots must be based on specific application requirements and 
domains (Hiwa et al, 2024). 


Chat-GPT, developed by OpenAL, is a language model. It employs a transformer architecture 
to understand and generate human-like responses based on the input it receives. (OpenAI, 
https://chat.openai.com/) 


Gemini, initially introduced as Bard and built on the Pathways Language Model 2 (PaLM 2), 
is generate human-like responses to user queries. Integrated with Google's search network, it has 
access to up-to-date online information. (Google, https://gemini.google.com/app) 
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Microsoft Copilot, initially introduced as Bing Chat in a search engine and browser, has 
unified different chat services and has been integrated into an operating system with a dedicated key. 
It uses the model based on OpenAI's GPT-4 and operates on a freemium model. (Microsoft, 
https://copilot.microsoft.com/) 


Datasets 


The reference dataset includes Chris Drew's work (Drew, 2023), Ertekin's report (Ertekin, 2017), and 
the Ministry of National Education's environment courses curriculum gains (MEB, 2018), all within 
the scope of Bloom's Revised Taxonomy. The Al-generated dataset incorporates questions designed 
to align with the steps of RBT's cognitive domain. The question generation guidelines rely on 
definitions and verbs from the literature. Additionally, this dataset utilizes news topics from the 
Science Daily website (https://www.sciencedaily.com/). The site organizes news into 12 categories, 
and for this study, we select 14 topics from each category to ensure diversity. 


Prompt Design 


The definition and abilities of the categories/dimensions in preparing and classifying Turkish 
questions according to the Revised Bloom's Taxonomy is based on the work of Anderson and 
Krathwohl. According to the RBT, a common reference point in classification is verbs. Providing 
verbs to be used in Turkish questions in the prompt has ensured that the questions are more consistent. 
Verbs is referenced from the Turkish translations of the step descriptions in the literature and Stanny's 
compilation. 


Prompt Design 


Vv 


Yeniden Yapilandirilmis Bloom Taksonomisinin Bilissel Siirec Basamagi [1 to 6]: 
[Name of the cognitive process category] 

[Definition of the cognitive process category] 

Bu basamaga uygun sorular iretmeni istryorum. 

Oéretimde kullanilan dil geregi sorular1 emir kipiyle olusturmalisin. 

Sorularin bilissel siirec basamagin1 ve uygun fiilleri icermesini saglamalisin. 


Y¥VVWV WV 


Sorularda bu fiilleri kullanmalisin. "[category verbs]" 

Soru kelimelerini kullanmamalisin sorular: verdigim fillerle tamamlamalisin. 

Her konu icin bir soru olusturmalisin. 

Sorular: tek fiil ile tfade etmelisin. 

Her soruda aynt fiili tekrarlamamaya cals. 

Sorunun fiille anlam biitiinliigii olusturmasina dikkat et. 

Soru ve konular: tablo formatinda tut. 

Siitunlar: Soru, konu 

Sorularin [..] basamaginda olmasina dikkat et. [..] basamagi [diistik/yiiksek] bilissel 
seviyeli basamaktir. [category abilities] ve bu kapasiteleri sorularla ortaya cikarmay1 
hedefler. 

Talimatlara uyarak her konudan birer soru olusturacaksin. 

[Topics] 


> 
> 
> 
> 
> 
> 
> 
> 


Figure 1. Prompt design for preparing and classifying Turkish questions according to the Revised 
Bloom's Taxonomy. 
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Vector Representation Method 


The sentence vector is the numeric representation of words in a sentence. Obtaining this 
representation is done by text-embedding techniques. In this study, the pre-trained Turkish BERT 
model, without fine-tuning, is used to obtain the vector representation of the input questions. 


BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model 
based on transformers used in the field of natural language processing. (Devlin et al., 2019). The 
Turkish BERT (BERTurk) model is a trained and customized version to perform better on texts in 
the Turkish language. A hybrid model with BERTurk has demonstrated its highest performance in 
sentiment analysis within Turkish question-answer systems (Tohma et al., 2023) Another study 
illustrates how BERT improves results across various Turkish NLP tasks—such as sentiment 
analysis, cyberbullying identification, text classification, emotion recognition, and spam detection— 
by reducing the need for extensive pre-processing compared to traditional machine learning methods 
(Kutlu et al., 2017; Ozcift et al., 2021). 


Similarity Metrics 


The method used for selecting a similarity metric depends on the characteristics and requirements of 
the text similarity problem. Text similarity is measured by calculating the distance length between 
the vector representations of texts. Cosine similarity is one of the most frequently used measures in 
this respect (Wang and Dong, 2020). 


The cosine similarity metric calculates the similarity between two vectors based on the angle 
between those vectors in vector space. Mathematically, the cosine similarity provides a value between 
0 (lowest) and | (highest), indicating the similarity based on the angle. The cosine similarity between 
two vectors is calculated using the following formula: 


cosine_similarity (A, B) = (A . B) / (||Al| * |[B]) 
Machine Learning 


Machine learning is a field that enables computers to learn without being explicitly programmed 
(Samuel, 1959). It utilizes data in the execution of algorithms that iteratively improve performance. 
Selecting the appropriate algorithm depends on the specific characteristics of the dataset. 


SVM is a kernel-based machine learning model for classification and regression tasks. 
Various kernel types are used to transform datasets into high-dimensional spaces in different ways 
(Cervantes et al, 2020). Depending on the characteristics of the dataset, the kernel can be selected 
according to the class distribution. The kernel function used in this study: 


Linear kernel: K(x1, xj) = (xi * xj) 
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Removing certain parts of the sentence 


According to RBT, the parts of sentences containing verb and verbal are defining elements of the 
cognitive dimension for the classification task. The questions generated for our data are independent 
of the reference questions in terms of topic. The irrelevant (topic) parts of the sentences are filtered 
by determining parts of speech and dependency relations within the sentences according to specific 
linguistic criteria. For the method utilized the Spacy library adapted to the Turkish language (Altinok, 
2023) such as Syntactic dependency of words -token.dep_, Part-of-speech of words - token.pos_. 


Results 


Distinguishing between Al-generated and expert-prepared questions can be challenging due to 
similarities in structure and potential ambiguity in language usage. To address this challenge, we 
carried out a comparison study that concentrated on the subtle differences in language and structural 
patterns between questions created by AI models and questions written by human experts. 


Similarity Results 


Utilizing a metric for comparing texts automates the measurement of similarity. In this context, the 
similarity metric method was used as an alternative analysis to compare the structural and semantic 
features of questions generated by AI models with those prepared by subject matter experts. The 
reference questions used in this section are drawn from the study by Mercan, which includes questions 
in each cognitive dimension on the same topic (Mercan, 2019). 


Table 1. Cosine similarity metric comparison 


References Copilot Gemini ChatGPT 
0 0.526 0.529 0.567 
1 0.624 0.423 0.594 
2 0.555 0.520 0.370 
3 0.628 0.616 0.665 
4 0.704 0.492 0.458 
5 0.667 0.4785 0.328 
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Table 2. Samples of Turkish sentences reference sentences and generated sentences by AI tools 
according to Bloom's Taxonomy level. 


Bloom's 
Taxonomy 
Level References Copilot Gemini ChatGPT 
Remember Diinya’nin yuvarlak 
oldugu fikriniilk Diinyanm  gseklinin temel Diinyanin seklini ve Diinyanin seklini 
ortaya atan kisi bilgilerini hatirlayin. katmanlarimi tarif edin. tanmmlayin. 
kim? 
Understand Diinyanin 
yuvarlakhgin i .. Diinyanin —_ katmanlarimin 
Pes Diinyanin seklini agiklayn . a ae ae Z eedevatist Mtees 
kanitlayan kisinin 6zelliklerint ve iliskilerini Ditinyanin seklini gésterin. 
> be ve yorumlayin. } Pane 
yontemini bir semayla gésterin. 
aciklayin. 
Apply Diinyanin yuvarlak 
I 1 
P aie 4 ah : Diinyanin katmanlarin. Diinya haritasinda bir rota 
ilgilenen bilim Diinyanin seklini somut bir : acct 
: Ff iy iss gdsteren bir model maket _ belirleyin ve neden 
insanlarinin omekle gésterin. ie 
re olusturun. se¢tiginizi agiklayin. 
kronolojisini 
hazirlayin. 
Analyze Diinyanin sekli ve 
evrendeki yeri ile 
ilgili farkh 7 by: Diinyanin = katmanlarmin _Diinya_haritasindaki farkh 
oan é Diinyanin seklini olusturan . 7, eae ee ss : 
diisiinceleri . . . Ozelliklerint ve iliskilerini cografi bélgeleri 
bilesenleri ayirin ve nasil bir |. ‘ ‘ Per ee 
karsilastirarak Pe tere ; bir diyagramda yeniden  karsilastirm ve 6zelliklerini 
oa araya geldiklerini inceleyin. : : eee 
benzerlikleri ve olusturarak analiz edin. belirleyin. 
farklhiklar1 
yorumlayin. 
Evaluate Diinyanin sekli ve 
evrendeki yeri ile . . .....  Dtinyanin katmanlarinin : : . 
: Dinyanin sekliyle ilgili . 7... ..,..., Dunya haritasmdaki farkl 
geg¢misten . : el . Ozelliklerini ve iliskilerini aide OF . . 
Sh a uae 6nemli 6zellikleri . . : cografi bélgelerin avantaj ve 
giiniimiize gelen S i gdsteren bir modelin . 
en ene degerlendirin ve Sena tie dezavantajlarini 
bilgi birikiminin eksikliklerini bulun ve . - 
karsilastirin. ie degerlendirin. 
yansimalarin Oneriler sunun. 
degerlendirin. 
Create Diinyanin yuvarlak 
oldugunu : : 
, Di harit ki farkl 
kanitlamaniz Diinyanin sekli hakkinda  Diinyanin katmanlarin1 aes ice paisa ont 
; si ‘ . : op a.  cografi bélgelerde 
gerekirse é6zgiin bir aciklama veya goésteren ve interaktif bir Be ote 
roe : siirdiirtilebilir kalkinma 
giintimtizde hangi model olusturun. model olusturun. Pe 
projeleri planlayin. 
yollar1 
kullanirdiniz? 


According to the results of Table 1, Gemini (0.423 to 0.616) and ChatGPT (0.328 to 0.665) 
showed low to moderate similarity to the questions prepared by experts and sometimes clearly 
deviated from the expert questions, suggesting significant variability and differences from the 
questions prepared by experts. As for the Copilot results, its showed moderate to high similarity 
(0.526 to 0.704), suggesting greater congruence in structure and meaning. These findings provide an 
empirical example of Al-generated questions can be distinguished from those prepared by experts 
based on these metrics, highlighting that even for the cognitive process of 'analysis' which achieves 
the highest average score, these AI models are unreliable sources to reference. Samples of Turkish 
sentences reference sentences and generated sentences by AI tools according to Bloom's Taxonomy 
level are shown Table 2. 
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Classification Results 


In continuation of the study, classification analysis was performed using machine learning algorithms. 
168 examples are generated for each RBT cognitive domain class. BERT vectors of the sentences 
served as input, with the cognitive domain serving as labels. The training phase employed the SVM 
algorithm with a linear kernel and a "one-vs-one" (ovo) decision function shape. In the classification 
using reference questions as test data, one hundred percent of the generated data was used for training. 
In the self-classification of the generated data, eighty percent of the data was used for training. A 
filtermg approach with SpaCy was applied to classify the reference questions. This approach is 
applied to avoid topic-oriented classification. In contrast, the questions generated by artificial 
intelligence were classified using entire sentences without such filtering, each generation evaluated 
within itself. 


Key steps in the classification process included: Tokenizing sentences and converting them 
into BERT embeddings. Classifying the resulting vectors using the SVM algorithm. Evaluating the 
performance of the algorithms using metrics such as accuracy, F1 score, precision, recall (as shown 
Table 3). 


Definitions of the metrics are as follows: 

Precision: TP / (TP + FP) 

Recall (Sensitivity): TP / (TP + FN) 

F1 Score: 2 * (precision * recall) / (precision + recall) 

TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives. 


Table 3. Classification of questions generated by AI tools. 


Copilot Gemini ChatGPT 
Bloom's 
Taxonomy Precision _Recall F1 Precision _ Recall F1 Precision _ Recall F1 
Level Score Score Score 
Remember 0.914 0.941 0.928 0.655 0.559 0.603 0.824 0.824 0.824 


Understand 0.788 0.788 0.788 0.586 0.515 0.548 0.750 0.818 0.783 


Apply 0.786 0.647 0.710 0.794 0.794 0.794 0.806 0.735 0.769 
Analyze 0.806 0.879 0.841 0.667 0.788 0.722 0.784 0.879 0.829 
Evaluate 0.914 0.941 0.928 0.722 0.765 0.743 0.969 0.912 0.939 

Create 0.914 0.941 0.928 0.914 0.941 0.928 0.938 0.882 0.909 


Based on the experimental results, Copilot achieved an accuracy of 0.856. The classification 
report indicates that the "remember," "evaluation," and "creation" classes all attained the highest F1 
score of 0.928, whereas the "application" class had the lowest F1 score of 0.710. Similarly, ChatGPT 
achieved an accuracy of 0.841. Its classification report reveals that the "evaluation" class achieved 
the highest F1 score of 0.939, whereas the "application" class recorded the lowest Fl score of 0.769. 
In contrast, Gemini achieved an accuracy of 0.727. According to its classification report, the 
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"creation" class obtained the highest Fl score of 0.928, while the "understanding" class showed the 
lowest F1 score of 0.548. 


On average, the creation class shows the highest performance (0.921), while the understanding 
class exhibits the lowest (0.706) (as shown Table 4). This difference can be attributed to the fact that 
there is some overlap in the types of content such as fillers and functional elements between the 
‘understanding’ class and the other classes. When cognitive levels are approached outside the 
framework determined by Bloom's taxonomy and not scientifically, ‘understanding’ often emerges as 
a populist concept that covers all cognitive processes. Consequently, these dynamics indirectly 
influence LLMs, which are shaped by human production and dominated by human knowledge. 


Table 4. Classification of reference questions with generated questions. 


Copilot Gemini ChatGPT 
Bloom's 
Taxonomy Precision _ Recall F1 Precision _ Recall F1 Precision _Recall F1 
Level Score Score Score 
Remember 0.914 0.941 0.928 0.655 0.559 0.603 0.824 0.824 0.824 


Understand 0.788 0.788 0.788 0.586 0.515 0.548 0.750 0.818 0.783 


Apply 0.786 0.647 0.710 0.794 0.794 0.794 0.806 0.735 0.769 
Analyze 0.806 0.879 0.841 0.667 0.788 0.722 0.784 0.879 0.829 
Evaluate 0.914 0.941 0.928 0.722 0.765 0.743 0.969 0.912 0.939 

Create 0.914 0.941 0.928 0.914 0.941 0.928 0.938 0.882 0.909 


Based on the experimental results from evaluating the reference questions, the model trained 
with data generated by Copilot achieved an accuracy of 0.518. The classification report indicates that 
the ‘creation’ class attained the highest Fl score of 0.745, while the ‘evaluation’ class had the lowest 
Fl score of 0.303. In contrast, the model trained with data generated by ChatGPT achieved an 
accuracy of 0.512. According to its classification report, the "evaluation" class achieved the highest 
F1 score of 0.698, whereas the "application" class recorded the lowest F1 score of 0.255. Similarly, 
the model trained with data generated by Gemini achieved an accuracy of 0.512. Its classification 
report reveals that the "recall" class obtained the highest F1 score of 0.702, while the "analysis" class 
showed the lowest F1 score of 0.346. 


On average, the "creation" class shows the highest performance (0.745), while the 
"evaluation" class exhibits the lowest (0.303). This difference may stem from the limited scope of 
verbs associated with the ‘evaluation’ class, necessitating attention to nuanced aspects beyond the verb 
within the remainder of the sentence. As a result, AI generations tend to focus predominantly on the 
‘verb’ and the extracted ‘topic,’ potentially overlooking broader contextual nuances. 


The analyses show the nuanced performance variations among AlI-chatbot generations in 
executing cognitive tasks aligned with educational frameworks. Copilot demonstrates the highest 
overall performance among the Al-chatbots evaluated. 
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Discussion 


In this paper, we have experimented with categorizing Al-generated questions in the educational 
domain, focusing on alignment with Revised Bloom's Taxonomy (RBT). Our analysis used AI 
chatbots; ChatGPT, Gemini, and Microsoft Copilot, each known for its text generation functions in 
the context of extensive applications. Our methodological approach involved creating an AI- 
generated dataset aligned with Bloom's Taxonomy cognitive processes using outputs from these 
chatbots. We compare this dataset with a reference dataset containing expert questions derived from 
scientific curriculum sources and educational literature. For structural similarity and semantic 
comparison of Al-generated versus expert questions, we used the Bert to obtain sentence vectors. The 
similarity between certain vectors is calculated using cosine similarity as the metric. Applying SVM 
as a machine learning classifier allowed further insights into the scaling analysis of all vectors. 


In the results, while chatbots like Copilot are relatively promising in terms of their potential 
to generate educational questions according to cognitive frameworks, consideration would still have 
to be given to selection amongst AI models according to the specific educational targets. In this way, 
educators and curriculum developers have at their disposal a way to provide facilitated assistance to 
learners; however, they need to understand that these systems generate content that cannot capture 
the expected goals and nuances. 


The study faces several limitations, including the size of the dataset, the specificity of the 
Turkish language, and the rapid evolution of AI models. Future research might add to these guidelines 
by exploring increased training methodologies of the AI models' alignment with educational 
frameworks and acceptability toward Al-generated educational materials. Additionally, incorporating 
diverse datasets and evaluating models in various linguistic and temporal settings may enhance the 
applicability of the findings. 


In conclusion, this research explores the potential and limitations of using Al-generated 
questions in education and compares human-artificial intelligence language production. The results 
of the experimental study indicated that caution is necessary when applying AI technologies in 
educational contexts. Although Als can extract specific patterns from data when generating human- 
like texts, they often exhibit a uniform pattern and lack nuanced context. As Chomsky’s study notes 
in his research, these patterns may remain superficial and dubious compared to human intelligence 
because they are not limited by rationality. 
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